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Large instruction window processors achieve high performance by exposing large amounts 
of instruction levelparallelism. However, accessing large hardware structurestypically 
required to buffer and process such instructionwindow sizes significantly degrade the cycle 
time. This paper proposes a novel Checkpoint Processing and Recovery(CPR) 
microarchitecture, and shows how to Implement aiarge instruction window processor without 
requiring largestructures thus permitting a high clock frequency. We fo ... 
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Low-power embedded processors utilize compact instruction encodings to achieve small 
code size. Instruction sizes of 8 to 16 bits are common. Such encodings place tight 
restrictions on the number of bits available to encode operand specifiers, and thus on the 
number of architected registers. The central problem with this approach is that performance 
and power are often sacrificed as the burden of operand supply is shifted from the register 
file to the memory due to the limited number of register ... 
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Modern processors employ a large ampunt of hardware to dynamically detect parallelism in 
single-threaded programs and maintain the sequential semantics implied by these 
programs. The complexity of some of this hardware diminishes the gains due to parallelism 
because of longer clock period or increased pipeline latency of the machine. In this paper we 
propose a processor implementation which dynamically schedules groups of instructions 
while executing them on a fast simple engine and caches them f ... 

Pseudo vector processor based on register-windowed superscalar pipeline 
K. Nalcazawa, H. Nal<amura, H. Imori, S. Kawabe 
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Intergraph's CLIPPER microprocessor is a higti performance, three cliip module tliat 
implements a new instruction set architecture designed for convenient programmabllity, 
broad functionality, and easy future expansion. 
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Self-testing manufacturing defects in a system-on-a-chip (SOC) by running test programs 
using a programmable core has several potential benefits including, at-speed test-ing, low 
DfT overhead due to elimination of dedicated test circuitry and better power and thermal 
management during testing. However, such a self-test strategy might require a lengthy test 
program and might achieve a high enough fault coverage. We propose a DfT methodlogy to 
improve the fault coverage and reduce the test p ... 
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Ing-Jer Huang 
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This paper presents a hardware/software co-synthesis approach to pipelined ISP (instruction 
set processor) design. The approach synthesizes the pipeline structure from a given 
instruction set architecture (behavioral) specification. In addition, it generates a set of 
reordering constraints that guides the compiler back-end (reorderer) to properly schedule 
instructions so that possible pipeline hazards are avoided and throughput is improved. Co- 
synthesis takes place while resolving ... 

Keywords: compiler Instruction optimizationV Instruction set processor, pipeline hazards, 
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terms 

Higher microprocessor frequencies accentuate the performance cost of memory accesses. 
This is especially noticeable in the Intel's IA32 architecture where lack of registers results in 
increased number of memory accesses. This paper presents novel, non-speculative 
technique that partially hides the increasing load-to-use latency, by allowing the early issue 
of load instructions. Early load address resolution relies on register tracking to safely 
compute the addresses of memory refere ... 
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Code optimization and scheduling for superscalar and superpipelined processors often 
increase the register requirement of programs. For existing instruction sets with a small to 
moderate number of registers, this increased register requirement can be a factor that limits 
the effectivess of the compiler. In this paper, we introduce a new architectural method for 
adding a set of extended registers into an architecture. Using a novel concept of connection, 
this method allows the data stored in ... 
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June 2001 Proceedings of the 15th international conference on Supercomputing 
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In this paper, we present a new framework for selecting, duplicating and sequencing 
instructions so as to decrease register pressure. The motivation for this work is to target 
current and future high-performance processors where reductions In register pressure in the 
compiled programs can lead to improved performance. 

For instruction selection and duplication, a unique feature of our approach is the ability to 
perform these transformations on intermediate-language instru ... 
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In this paper, we propose a multithreaded processor architecture which improves machine 
throughput. In our processor architecture, instructions from different threads (not a single 
thread) are issued simultaneously to multiple functional units, and these instructions can 
begin execution unless there are functional unit conflicts. This parallel execution scheme 
greatly improves the utilization of the functional unit. Simulation results show that by 
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This paper describes the architecture for issuing multiple Instructions per clock in the 
Nonstop Cyclone Processor. Pairs of instructions are fetched and decoded by a dual two- 
stage prefetch pipeline and passed to a dual six-stage pipeline for execution. Dynannic 
branch prediction is used to reduce branch penalties. A unique nnicrocode routine for each 
pair is stored in the large duplexed control store. The nnicrocode controls parallel data paths 
optimized for executing the most frequent Instr ... 
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It is shown that the concept of context-dependent machine instructions may be used in the 
architectural design of processors with short wordlengths, such as 8-bit microprocessors, in 
order to increase the capabilities of such machines above those of currently available 
models. 
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